Training of Speaker-clustered Acoustic Models for use in Real-time Recognizers
نویسندگان
چکیده
The paper deals with training of speaker-clustered acoustic models. Various training techniques Maximum Likelihood, Discriminative Training and two adaptation based on the MAP and Discriminative MAP were tested in order to minimize an impact of speaker changes to the correct function of the recognizer when a response of the automatic cluster detector is delayed or incorrect. Such situation is very frequent e.g. in online subtitling of TV discussions (Parliament meetings). In our experiments the best cluster-dependent training procedure was discriminative adaptation which provided the best trade-off between recognition results with correct and non-correct cluster detector information.
منابع مشابه
A posteriori and a priori transformations for speaker adaptation in large vocabulary speech recognition systems
The speaker-dependent HMM-based recognizers gives lower word error rates in comparison with the corresponding speaker-independent recognizers. The aim of speaker adaptation techniques is to enhance the speakerindependent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. In this paper, we propose a method using test and tr...
متن کاملImprovements in Non-Verbal Cue Identification Using Multilingual Phone Strings
Today’s state-of-the-art front-ends for multilingual speechto-speech translation systems apply monolingual speech recognizers trained for a single language and/or accent. The monolingual speech engine is usually adaptable to an unknown speaker over time using unsupervised training methods; however, if the speaker was seen during training, their specialized acoustic model will be applied, since ...
متن کاملSpeaker-Clustered Acoustic Models Evaluated on GPU for On-line Subtitling of Parliament Meetings
This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent...
متن کاملSpeaker adaptive training: a maximum likelihood approach to speaker normalization
This paper describes the speaker adaptive training (SAT) approach for speaker independent (SI) speech recognizers as a method for joint speaker normalization and estimation of the parameters of the SI acoustic models. In SAT, speaker characteristics are modeled explicitly as linear transformations of the SI acoustic parameters. The effect of inter-speaker variability in the training data is red...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کامل